Parallelism From Functional Decomposition

ABSTRACT

A system and method for performing functional decomposition of a software design to generate a computer-executable finite state machine. Initially, the software design is received in a form wherein functions in the software design are repetitively decomposed into (1) data and control transformations. Included between the functions are control flow indicators which have transformation-selection conditions associated therewith. The data transformations and the control transformations are translated into states in the finite state machine. The transformation-selection conditions associated with the control transformations are translated into state transitions in the finite state machine.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.13/425,136, filed Mar. 20, 2012 and which is incorporated by referencein its entirety herein.

BACKGROUND

Traditional models for functional decomposition of algorithms are vaguein their definition of lower decomposition levels. In the Yourdonstructured model, control transformations decompose into statetransition diagrams which represent the real-time aspects of the system.Although control transformations were used by Yourdon, Ward and Millor,and Hatley and Pirbhai to define real-time control transformationevents, their definition of control transformation does not include anyof the following types of software statements: goto, if-then-else,switch loops, and subroutine calls.

If the transformations decompose from the highest to the lower levels,but the complexity is not constrained by the developer as thefunctionality decomposes, as in the McCabe model, the amount of controlis unconstrained, and it is not clear when the decomposition should end.Furthermore, since the unconstrained decomposition does not inherentlysimplify the design, it does not actually meet the criteria ofmathematical functional decomposition.

SOLUTION

To eliminate the above-noted shortcomings of previous decompositionmethods, a simple graph, created in accordance with the multiprocessorfunctional decomposition (MPfd) model described herein, is constrainedto a single control structure per decomposition level and exposes alltransitions, preparing the graph for translation into a finite statemachine (FSM).

Traditionally, FSMs have been used to create compilers and have alsobeen used in sequential circuit design. Being able to use FSMs ingeneral software design and thus in general programming offers hugebenefits for general programming including increased software clarityand the ability better combine computer software with computer hardware.That is, instead of simply compiling a source program into an FSM aspart of the process of generating an executable code, it is highlyadvantageous to utilize the FSM, such as by manipulating a graphicaldepiction of the FSM, when generating the software design.

Disclosed herein are a system and method for performing functionaldecomposition of a software design to generate a computer-executablefinite state machine. Initially, the software design is received in aform wherein functions in the software design are repetitivelydecomposed into (1) data and control transformations. Included betweenthe functions are control flow indicators which havetransformation-selection conditions associated therewith. The datatransformations and the control transformations are translated intostates in the finite state machine. The transformation-selectionconditions associated with the control transformations are translatedinto state transitions in the finite state machine.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a system diagram showing an exemplary computing environment inwhich the present system functions.

FIG. 2 is a prior art standard functional decomposition diagram.

FIG. 3 shows an example of multiple threads from decomposition offunction with dissimilar parameters.

FIG. 4 shows an example of functional decomposition with transitionconditions and threads.

FIG. 5 shows an example of functional decomposition with conditions,threads and added loops.

FIG. 6 is an example illustrating the highest level decomposition (level0).

FIG. 6 a is a flowchart showing an exemplary algorithm for converting anMPfd to a finite state machine.

FIG. 7 shows an exemplary functional decomposition diagram.

FIG. 8 shows a finite state machine view of the translation of asingle-process bubble into its state machine equivalent.

FIG. 9 shows an exemplary lower level decomposition diagram, functionaldecomposition view.

FIG. 10 shows an exemplary lower level decomposition diagram, finitestate machine view.

FIG. 11 shows multiple loops, functional decomposition view.

FIG. 12 shows an example of multiple loops, finite state machine view.

FIG. 13 shows an example of a loop with label, functional decompositionview.

FIG. 14 shows an example of a loop with label, finite state machineview.

FIG. 15 shows an example of multiple data on lines and multipleconditions on transition.

FIG. 16 shows an example of transition and data lines using labels.

FIG. 17 is an exemplary lower level decomposition diagram with compositevariable names, functional decomposition view.

FIG. 18 is an exemplary lower level decomposition diagram withoutcomposite array names and dimensionality.

FIG. 19 is an exemplary lower level decomposition diagram with compositearray names and dimensionality.

FIG. 20 is an exemplary lower level decomposition diagram with compositematrix names with multiple dimensions.

FIG. 21 shows an example of associated bubbles linked via control-flows.

FIG. 22 shows an example of unassociated bubbles.

FIG. 23 shows an example of data associated bubble.

FIG. 24 shows an example of control linked, unassociated level-2bubbles.

FIG. 25 shows an example of transformation to standard unassociatedform.

FIG. 26 shows an example of transformation to standard associated form.

FIG. 27 shows an example of unassociated process bubbles to taskparallel indicating finite state machine.

FIG. 28 shows an example of transpose notation, functional decompositionview.

FIG. 29 shows an example of transpose notation, finite state machineview.

FIG. 30 shows an example of scatter/gather notation, functionaldecomposition view.

FIG. 31 shows an example of scatter/gather, finite state machine view.

FIG. 32 shows an example of parallel i/o indication.

FIG. 33 shows an example of selecting particular matrix elements.

FIGS. 34 a and 34 b show examples of incomplete decomposition.

FIG. 35 shows an example of a 1-dimensional monotonic workload symbol,functional decomposition view.

FIG. 36 shows an example of a 1-dimensional monotonic workload symbol,finite state machine view.

FIG. 37 shows an example of a 2-dimensional monotonic workload symbol,functional decomposition view.

FIG. 38 shows an example of a 2-dimensional monotonic workload symbol,finite state machine view.

FIG. 39 shows an example of a 3-dimensional monotonic workload symbol,functional decomposition view.

FIG. 40 shows an example of a 3-dimensional monotonic workload symbol,finite state machine view.

FIG. 41 shows an example of a left-right exchange symbol—no stride,functional decomposition view.

FIG. 42 shows an example of a left-right exchange symbol—no stride,finite state machine view.

FIG. 43 shows an example of a left-right exchange—with stride,functional decomposition view.

FIG. 44 shows an example of a left-right exchange—with stride, finitestate machine view.

FIG. 45 shows an example of a next-neighbor exchange symbol—no stride,functional decomposition view.

FIG. 46 shows an example of a next-neighbor exchange—no stride, finitestate machine view.

FIG. 47 shows an example of a next-neighbor exchange symbol—with stride,functional decomposition view.

FIG. 48 shows an example of a next-neighbor exchange—with stride, finitestate machine view.

FIG. 49 shows an example of a 3-dimensional next-neighbor exchangesymbol—no stride, functional decomposition view.

FIG. 50 shows an example of a 3-dimensional next-neighbor exchange—nostride, finite state machine view.

FIG. 51 shows an example of a 3-dimensional next-neighbor exchangesymbol—with stride, functional decomposition view.

FIG. 52 shows an example of a 3-dimensional next-neighbor exchange—withstride, finite state machine view.

FIG. 53 shows an example of a 2-dimensional matrix with 2-dimensionalstencil for 2-d next-n-neighbor exchange symbol—no stride, functionaldecomposition view.

FIG. 54 shows an example of a 2-dimensional matrix with 2-dimensionalstencil for 2-d next-n-neighbor exchange—no stride, finite state machineview.

FIG. 55 shows an example of a 2-dimensional matrix with 2-dimensionalstencil for 2-d next-n-neighbor exchange symbol—with stride, functionaldecomposition view.

FIG. 56 shows an example of a 2-dimensional matrix with 2-dimensionalstencil for 2-d next-n-neighbor exchange—with stride, finite statemachine view.

FIG. 57 shows an example of a 1-dimensional all-to-all exchangesymbol—no stride, functional decomposition view.

FIG. 58 shows an example of a 1-dimensional all-to-all exchange—nostride, finite state machine view.

FIG. 59 shows an example of a 1-dimensional all-to-all exchangesymbol—with stride, functional decomposition view.

FIG. 60 shows an example of a 1-dimensional all-to-all exchange—withstride, finite state machine view.

FIG. 61 shows an example of a 2-dimensional all-to-all exchangesymbol—no stride, functional decomposition view.

FIG. 62 shows an example of a 2-dimensional all-to-all exchange—nostride, finite state machine view.

FIG. 63 shows an example of a 2-dimensional all-to-all exchangesymbol—with stride, functional decomposition view.

FIG. 64 shows an example of a 2-dimensional all-to-all—with stride,finite state machine view.

FIG. 65 shows an example of a 3-dimensional all-to-all exchangesymbol—no stride, functional decomposition view.

FIG. 66 shows an example of a 3-dimensional all-to-all exchange—nostride, finite state machine view.

FIG. 67 shows an example of a 3-dimensional all-to-all exchangesymbol—with stride, functional decomposition view.

FIG. 68 shows an example of a 3-dimensional all-to-all exchange—withstride, finite state machine view.

DETAILED DESCRIPTION

Although functional decomposition has long been used to design software,the multiprocessor functional decomposition (MPfd) techniques andmethods described herein extend beyond mere design. First, any designcreated using the presently described MPfd methods can, by definition,be translated directly into a finite state machine (FSM). Since fieldprogrammable gate arrays (FPGAs) and graphical processing units (GPUs)use FSMs in their programming, the MPfd is useful in creating not onlyCPU but GPU and FPGA codes as well. Second, incorrect MPfd structurescan be automatically detected and corrected. Third, MPfd techniquesincorporate the automatic selection of the pass-by-value or thepass-by-reference data movement model for moving data between functionalelements. This allows the presently-described system to combine computerlanguages like “C” and “C++” with other computer languages like Fortranor Java. Fourth, MPfd elements are annotated with information concerningthe use of any data, not just the data type. Using the MPfd model toautomatically find task-level and non-task-level parallelism fromdesign, instead of the user finding it within the code, allows separatecompute threads to simultaneously process data.

Since a task in the present system is equivalent to one or more datatransformations (or simply “transformations”) and since a transformationis a state in the present finite state machine (FSM), showing whichstates can be executed in parallel is equivalent to indicating the taskparallelism.

DEFINITIONS

For the purpose of this document, the following definitions are suppliedto provide guidelines for interpretation of the terms below as usedherein:

Function—a software routine, or more simply an algorithm that performsone or more transformations.

Control Kernel—A control kernel is a software routine or function thatcontains only the following types of computer-language constructs:subroutine calls, looping statements (for, while, do, etc.), decisionstatements (if-then-else, etc.), and branching statements (goto, jump,continue, exit, etc.).

Process Kernel—A process kernel is a software routine or function thatdoes not contain the following types of computer-language constructs:subroutine calls, looping statements, decision statements, or branchingstatements. Information is passed to and from a process kernel via RAM.

State Machine—The state machine employed herein is a two-dimensionalnetwork which links together all associated control kernels into asingle non-language construct that provides for the activation ofprocess kernels in the correct order. The process kernels form the“states” of the state-machine while the activation of those states formthe state transition. This eliminates the need for softwarelinker-loaders.

State Machine Interpreter—for the purpose of the present document, aState Machine Interpreter is a method whereby the states and statetransitions of a state machine are used as active software, rather thanas documentation.

Node—A node is a processing element comprised of a processing core, orprocessor, memory and communication capability.

Data transformation—A data transformation is a task that accepts data asinput and transforms the data to generate output data.

Control transformation—A control transformation evaluates conditions andsends and receives control to/from other control transformations and/ordata transformations.

Control bubble—A control bubble is a graphical indicator of a controltransformation. A control bubble symbol indicates a structure thatperforms only transitions and does not perform processing.

Process bubble—A process bubble is a graphical indicator of a datatransformation.

Finite state machine—A finite state machine is an executable programconstructed from the linear code blocks resulting from transformations,where the transformation-selection conditions are state transitionsconstructed from the control flow.

Graphics—Graphics means both the display by a computer of concepts suchas a software design and FSM using visible objects, as well as theability of a user to manipulate those objects to manipulate the conceptsrepresented by the graphics.

Computing Environment

FIG. 1 is an exemplary diagram of the computing environment in which thepresent system and method operates. As shown in FIG. 1, system 100includes a processor 101 which executes tasks and programs including akernel management module 110, an algorithm management module 105, statemachine 124, a kernel execution module 130, and an algorithm executionmodule 125. System 100 further includes storage 107, in which is storeddata including libraries 115/120 which respectively store algorithms 117and kernels 122. Storage 107 may be RAM, or a combination of RAM andother storage such as a disk drive. Module 102 performs a translation ofa graphical input functional decomposition diagram 700 (see, e.g., FIG.7) to corresponding MPfd functions (ultimately, states in a statemachine), and stores the translated functions in appropriate librariesin storage area 108. Module 103 generates appropriate FSMs from thetranslated functions.

System 100 is coupled to a host management system 145, which providesmanagement of system functions, and issues system requests. Algorithmexecution module 125 initiates execution of kernels invoked byalgorithms that are executed. Algorithm execution system 135 maycomprise any computing system with multiple computing nodes 140 whichcan execute kernels stored in system 100. Management system 145 can beany external client computer system which requests services from thepresent system 100. These services include requesting that kernels oralgorithms be added/changed/deleted from a respective library within thecurrent system. In addition, the external client system can request thata kernel/algorithm be executed. It should be noted that the presentsystem is not limited to the specific file names, formats andinstructions presented herein.

A kernel is an executable computer program or program segment thatcontains data transformation/data code, and no program execution controlcode, where execution control code is any code that can change whichcode is to be executed next. In the exemplary embodiment describedherein, kernels 122 are stored in a kernel library file 121 in kernellibrary 120.

An algorithm is a state machine that comprises states (kernelinvocations) and state transitions (the conditions needed to go from onestate to another). References to the “system” in this section refer ingeneral to system 100, and in applicable embodiments, to algorithmmanagement module 105. Each algorithm 117 is kept in an algorithmdefinition file 116 in algorithm library 115 with a name(Algorithm_Title) that is the concatenation of the organization name,the category name, algorithm name, and user name with a ‘_’ characterbetween each of the names.

Algorithm Definition File with Task Parallelism Example:

StateNumber[(state1, . . . state n), state x, state y, state z)],KernelID(nodelnfo)(InputDatasets)(OutputDatasets)(Transitions)(Loops)

In the above example, the parallel tasks are executed at the same timeas “StateNumber”.

Functional Decomposition

A control transformation evaluates conditions and sends and receivescontrol. One primary difference between the Yourdon model and thepresent MPfd model is in how control transformations are handled. MPfdallows a control transformation to contain non-event control items.Non-event control items are conditions that change the sequence ofexecution of a program (if-then-else, go to, function calls, functionreturns), and a condition is a regular conditional expression.

Variables used by a control transformation can only be used in acondition; they cannot be transformed into any other value. An Invokeinstruction initiates system operation; variables and constants are usedin conditions to transition to a control transformation; and a Returninstruction gives control back to the control transformation with thename of the returning routine. A control transformation can have onlyone selection condition per transformation, and there can be, at most,one control transformation per decomposition level.

The MPfd model creates hierarchical finite state machines (HFSM) whosestate transitions have conditions and whose states are datatransformations and control transformations. Data transformations canalways, eventually, be associated with linear code blocks, while controltransformations contain only transitions with no associated code blocks.

Data transformations represent the parallelizable portion of thesoftware design. In MPfd designs, there are three data transformationtypes: associated, unassociated, and ambiguous. These types areconcerned with the relationship between an upper-level transformationand its immediate next-level decomposition.

Associated transformations are grouped together and share data and/orcontrol. Unassociated transformations are grouped together but share nodata or control. Unassociated transformations can be executed inparallel. This is called task-level parallelization. Ambiguoustransformations can always be converted to either associated orunassociated forms.

A data transformation can contain three types of looping structures:pre-work, post-work and recursion. Pre-work means that the loop-endingcondition is checked prior to performing the work and is denoted by adownward-pointing solid-loop symbol on a transformation. Post-work meansthat the loop-ending condition is checked after performing the work andis denoted by an upward-pointing solid-loop symbol on a transformation.Recursion means that the transformation calls itself and is denoted by adownward-pointing dashed-loop symbol on a transformation.

In the Yourdon model, only the control transformation decomposes into afinite state machine (FSM). In an MPfd design, the entire diagram of thecurrent decomposition level is converted into an FSM.

The lowest level of transformation decomposition represents a linearcode block. Decomposition ends when a data transformation cannotdecompose into a set of data transformations grouped together with acontrol transformation or when the decomposition results in the samegraph as the decomposed transformation.

y=f(a,b,c, . . . )=g(h ₁(h ₂(a,b),c),h ₃(d,h ₄(e),f), . . . , h_(n)(a,b,c, . . . ))  Equation 1 Mathematics of Functional Decomposition

In the example of Equation 1 above, the “hx( )” functions can also bedecomposed, and this decomposition can continue. In standarddecomposition, there is no specified last decomposition. In an MPfd, thedecomposition continues until only a series of function calls depictingthe structure of the function remains. A final decomposition then occurswhen there are no function calls, and only a single data transformationremains. At that point, the decomposition has progressed to the kernellevel, with the non-transformation functions equivalent to controlkernels and the transformation-only functions equivalent to processkernels. By its nature, an MPfd forms a disjoint, fully reduced set offunctions.

Function Dependencies

Transforming a function into its decomposed equivalent set of functionsmeans hierarchically identifying functions within functions such thatthe equivalent functionality of the original function is maintainedwhile the complexity of the component functions simplifies. This can beillustrated using the “g( )” function from Equation 1. The functiong(h1(h2(a, b), c), h3(d, h4(e)), . . . hn (a, b, c, d, e, f)) uses thevarious “hx( )” functions as its parameters. The “hx( )” functions can,thus, be ordered by the “g( )” function in the same way as variables areordered within a function. If some or all of the “hx( )” functions werealso decomposed, they would have the decomposed functions as additionalparameters. Unfortunately, the standard decomposition diagram notationdoes not make this functional ordering fully visible; that is, usually,the ordering is bound in the mathematics of “go”.

The standard view of the functional ordering of decomposed functions “g()” might give is shown in FIG. 2, which is a diagram showing a standard,prior art, functional decomposition. The function-order arrows (controlflow indicators) on the standard functional decomposition diagram ofFIG. 2 indicate the calling order of the functions. This calling ordercomes from a combination of the decomposition level (indicated by thelevel number shown on the diagram) and the parameter order of thefunctions as shown in FIG. 2. If the parameters used by some functionsare different from those used by some other functions, those disjointfunctions can be executed in parallel. The functions that share the sameparameters are said to be joint and are executed serially.

In order to create different joint execution streams, in accordance withthe present MPfd model, each function in a particular algorithm receivesan execution-stream identifier. In the present exemplary embodiment,this execution-stream identifier is represented as a program thread.Graphically illustrated, this MPfd-type decomposition takes the formshown in the diagram of FIG. 3, which shows multiple threads fromdecomposition of a function with dissimilar parameters. By examiningFIG. 3, it can be seen that thread 1 is used to coordinate the parallelexecution of threads 2 and 3. In threads 2 and 3, the thread-sharingfunctions share variables and are linear to each other, but it is clearthat threads 2 and 3 do not share data. Since there are no lineardependencies between thread 2 and thread 3 and no shared data, the twothreads can be executed simultaneously.

Conditions for Transition

In a standard functional decomposition diagram, the function-orderarrows contain no information other than indicating a generalrelationship. In the present system, a condition is added to thefunction-order arrows and this additional information can be used toidentify additional parallelism. The MPfd control flow indicators eachcomprise a function-order arrow plus an associated condition. Addingfunction-calling or transition information to a function-order arrow isa way to graphically depict the circumstances under which a function iscalled; that is, it shows the underlying logical/mathematical rationalefor transitioning to another function. For example, separate threadscontaining functions with the same parameters can be identified if theirtransition conditions are different, as shown on FIG. 4, which shows anexample of functional decomposition with transition conditions andthreads.

When the various function-order arrows indicate the transitionconditions, they can be thought of as state-transition vectors. If oneignores the variables, the called functions can be thought of as states.Note that the transitions shown in FIG. 4 are of two types: conditionalfrom calculation, and conditional because a particular function hascompleted. Both types are necessary.

Multiple Threads as Nested Finite State Machines

Since parameters are a part of the function, they can be considered partof the state. Thus, the present functional decomposition with conditionsand threads is functionally equivalent to a finite state machine.Furthermore, since each thread is separate from all other threads andeach thread consists only of states and transitions, the threads act astheir own state machines. Finally, since the threads are hierarchicallyformed, they depict nested finite-state machines.

Loops

As previously indicated, function transitions containing one of twotypes of transition conditions are required to externalize the controlelements of functions, allowing them to be gathered together as threads.It is also clear that the transition is a separate entity type from thefunctions themselves. Loops or looping structures can be thought of asspecial, more generalized cases of function transition. Whereas afunction transition contains only a condition, a looping structurecontains a loop order, an initial loop-index value, a loop-index changecalculation, and a loop-ending calculation.

FIG. 5 shows an exemplary functional decomposition with conditions,threads and added loops. The example in FIG. 5 shows three loops: asingle loop for a specific function, an outer loop across functions, andan inner loop. The loop across functions can be used to loop at thethread level. An inner loop, indicated by having the lowest number in amultiple-loop system, is incremented first with subsequent numbers thenincremented in successive order. It should be noted that it is notpossible to loop between threads.

Functional Decomposition Graphical Model

At this point, the ideas of the prior sections are manually incorporatedinto a simple graphical model (e.g., a functional decomposition diagram700, described below with respect to FIG. 7, et. seq.) that insures thatall of the transitions are exposed. The functional decomposition diagram700 is then input into graphics storage 108, and translated via graphicstranslation module 102 into corresponding functions in accordance withthe MPfd decomposition methods described herein. The translatedfunctions may be stored in memory area 108.

It should be noted that a looping structure can be attached to anydecomposition element. This looping structure initializes some dataelement (variable, array element, or matrix element), performs acalculation on the data element, tests the changed element value for theending condition, and then transitions to the next functionaldecomposition element required if the condition is met. The data elementused as the loop index is one of the function parameters, allowing thelooping structure to interact with the functional element.

Highest Level of Decomposition

Level 0 of the MPfd consists of only three types of objects: (1)terminators, (2) a single-process bubble (or other indicator)corresponding to the un-decomposed function, and (3) data stores, alongwith function transitions, loops, and function parameters. The purposeof the highest level of decomposition is to place a function into alarger context. This is accomplished by allowing systems that areexternal to the function to transmit data and control to/from thefunction. A terminator represents a complete external system. FIG. 6shows an example of the highest level (level-0) decomposition. The“Function Transition Conditions” of FIG. 6 correspond to the “TransitionConditions” shown in FIG. 4. The “Process Bubble Name” of FIG. 6corresponds to function “go” of Equation 1 and FIGS. 2-5. The “FunctionParameter Names” of FIG. 6 correspond to the parameters shown inEquation 1 and FIGS. 2-5.

Terminators

A terminator may be represented as a labeled square. The purpose ofterminators is to be able to identify interfaces to outside systems.These interfaces do not correspond to any mathematical functions butinstead represent access to data outside of the un-decomposed function.A terminator can be used to represent anything from another computersystem to a display screen. Functionally, a terminator behaves similarlyto a data store in that data can be sent from/to the terminator from/tothe un-decomposed function. The difference between a terminator and adata store is that a terminator can transition from/to the un-decomposedfunction.

Process Bubble

A process bubble, adds data, changes data, deletes data, or moves data.Since a process-bubble manipulates data, all activities associated withsending and receiving data to various stores is allowed. Furthermore,since a data element can also serve as a signal, activities associatedwith various signals are also allowed. A process bubble, as employed inthe MPfd model, is a graphical indicator of a data transformation, whichis a task that accepts input data and transforms it to generate outputdata.

Exemplary Allowed Process Bubble Activities

1) send data to a data store using output dataflow

2) receive data from a data store using input dataflow

3) Send standard signals to control-bubbles

4) Receive standard signals from control-bubbles

5) Send standard signals to terminators

6) Receive standard signals from terminators

7) Send data to terminators

8) Receive data from terminators

Single-Process Bubble

The single-process bubble of the highest level of decompositionrepresents the un-decomposed function. Since the function is notdecomposed, there can be only one level-0 process bubble. It is assumedthat the level-0 process bubble will be decomposed into other functions.

Data Stores

A function typically transforms data. One way to graphically depict thetransmission of data from/to the single-process bubble is via aterminator. Another way is with a data store. The displayed data storescan send/receive parameter data to/from the single-process bubble.

Control Bubble

A control bubble is a graphical indicator of a control transformation,which evaluates conditions and sends and receives control to/from othercontrol transformations and/or data transformations. A control bubblesymbol indicates a structure that performs only transitions that controlthe processing flow of a system, and which does not perform processing.

Conversion of MPFD to Finite State Machine

A primary goal of functional decomposition is the conversion of an MPfdinto a finite state machine. This conversion is enabled by adhering tothe following rules:

-   -   1) There can be only one control bubble at each decomposition        level.    -   2) Only a control bubble can invoke a process bubble.    -   3) A process bubble can only transmit or receive data from a        data store via a data flow.    -   4) A control bubble can only receive and use data as part of        determining which process bubble is to be called.    -   5) A control bubble can use process bubbles that have completed        to sequence to other process bubbles.    -   6) Data used by a control bubble must be from a process flow.    -   7) Process bubbles always return control to their calling        control bubble.    -   8) A control bubble can receive/use/send control signals from/to        control flows.    -   9) Process bubbles can decompose into simpler process bubbles        and/or a single control bubble and process bubbles.

An exemplary algorithm for converting an MPfd to a finite state machineis shown in FIG. 6A and described below.

Conversion Algorithm

Step 605: Compare decomposition level_(x) with level_((x+1)) anddetermine if level_((x+1)) process bubbles are associated orun-associated. A functional decomposition element, herein represented bya bubble symbol, can decompose into two types: associated andunassociated. Association has to do with the next-level decomposition ofthe bubble. Depending on the association type, loops defined at a higherdecomposition level behave differently when they are integrated into alower decomposition level.

If an un-decomposed bubble labeled “A” is decomposed into bubbleslabeled “1”, “2”, “3”, and “C”, then the un-decomposed bubble is said toreside at Level 1. Bubbles “1”, “2”, “3”, and “C” are said to reside atLevel 2. If a control-flow links together any level 2 bubbles, thenthose bubbles are said to be associated. If the control-flows do notlink together the level 2 bubbles, those bubbles are said to beunassociated.

Step 610: If level_((x+1)) process bubbles are associated, then performthe following steps 615-630.

Step 615: Any loops found at level_(x) start with the first associatedprocess bubble and end with the last associated process bubble. That is,multiple states are in the loop. All loops are associated with the setof process bubbles. This step machine-analyzes the design and correctlyinterprets how the loops work. Using information from one decompositionlevel to next allows the system to change the algorithm definition file116 such that the loops are executed correctly.

Step 620: The single control bubble that associates the level processbubbles will be the first state on the FSM of level_((x+1)).

Step 625: Level_((x+1)) control flows are translated into statetransition vectors of the level_((x+1)) FSM.

Step 630: Level_((x+1)) process bubbles are translated into the state ofthe FSM.

Step 635: If level_((x+1)) process bubbles are un-associated, thenperform the following.

Step 640: Any loops found at level_(x) will form a loop of the same typeon each un-associated level_((x+1)) process bubble.

Step 645: Decompose any non-recursively defined process bubble into an“x+1” level of the decomposed process bubble. Decomposition levels arecomplete when an “x+1” decomposition has no control bubble (a group ofun-associated process bubbles) or when there is no “x+1” level (step650). All level_((x+1)) data stores are hidden within the states of theFSM. The various “x+1” levels are represented as nested states, that is,each state is also an FSM.

FIG. 7 shows an exemplary functional decomposition diagram 700 and FIG.8 shows a finite state machine view of the translation of asingle-process bubble into its state machine equivalent. As used herein,the term “bubble” refers to a graphical element such as a solid ordashed line having the approximate form of a circle, ellipse, polygon,or the like. Notice that the control bubble is shown in the finite statemachine view as the first state; only the control flows are seen, andthese act as state transitions. The looping structure is captured as alooping state transition in the finite state machine 800. The processbubbles are translated into the states of the finite state machine. Thedata stores are captured as part of the states. Throughout thisdocument, where applicable, both the functional decomposition and finitestate machine view are shown in the Drawings.

Lower Level Decomposition

All decomposition levels below level 0 have one additional item: thecontrol bubble. There is only one control bubble per functiondecomposition. The purpose of the control bubble symbol is to indicate astructure that performs only transitions and does not performprocessing. This symbol has the effect of insuring that all non-loopingcontrol is fully exposed. Allowing only a single control bubble perfunction decomposition forces the complexity of the work to be expressedprimarily through decomposition, insuring a structured decompositionwith the minimum amount of complexity for each of the decompositions.The control bubble retains the name of the higher-level process bubble.

FIGS. 9 and 10 respectively show functional decomposition and finitestate machine views of an example of a lower level decomposition. Theprocess bubbles cannot directly send information from one process bubbleto another but can do so through a data store. If the data store has thesame name, the finite state machine view assumes it will have the samememory addresses. Likewise, a process bubble cannot directly transitionto another process bubble but can do so through a control bubble, whichis always the initial state.

Multiple Loops

In order to denote multiple loops, each loop definition is definedseparately. FIGS. 11 and 12 respectively show functional decompositionand finite state machine views of multiple loops. As shown in FIGS. 10and 11, “LPBN1” represents “Lower Process Bubble Name 1”:

Because multiple loop definitions can take up so much space on thediagram, a label representing a loop definition table can be usedinstead, changing the loop display to that shown in FIGS. 13 and 14,which respectively show functional decomposition and finite statemachine views of an exemplary looping operation.

Selecting the loop name can cause the loop definition(s) to be displayedas shown in Table 1, below:

TABLE 1 EXAMPLE LOOP LABEL DEFINITION Loop Name Loop 1 Initial indexvalue 1 Index Calculation 1 Loop End Condition 1 Loop 2 Initial indexvalue 2 Index Calculation 2 Loop End Condition 2

All loops associated with a process bubble are considered nested loops:one loop is within another loop. The first loop defined is consideredthe inner-most loop, with each successive outer loop defined assurrounding the inner loop. Thus, the example given in FIG. 11 and Table1 means that Loop 2 is inside of Loop 1; that is, Loop 1 is invokedafter Loop 2. Parallel loops occur when two or more process bubbles,without any mutual dependency and occurring at the same decompositionlevel, each have a loop. The loops of these independent, loop-bearingprocess bubbles can occur in parallel.

Data Elements Variables, Arrays, and Matrices

Variables, arrays, and matrices represent data elements of variousorders. A variable is a single data element of a certain type and can bethought of as a zero-dimensional object. An array consists of multipledata elements arranged linearly and can be thought of as asingle-dimensional object. A matrix consists of multiple data elementsarranged into greater than one dimension and can be thought of as ahigher-dimensional object. Transitions and loops can use these dataobjects in their conditions and calculations. This means that there mustbe a precise way to discuss all data objects.

As with the looping structures, there can be multiple data elements perinput/output data line or transition. This means that the line ortransition can be identified using a label that points to theappropriate definition, as shown in FIGS. 15 and 16, which respectivelyshow functional decomposition and finite state machine views.

Selection of the labeled transition in FIG. 16 would then display:

TRANSITION NAME Condition 1 Type1:name1 >2 Condition 2 Type3:name3 =12.5

Selection of the labeled data line in FIG. 16 would then display:

DATA LINE NAME Data Element 1 Type2:name2 Data Element 2 Type3:name3

Variables

A variable only requires a label and a type in order to identify it. Thefollowing composite label will fully identify a variable:

Type:variableName

The composite variable name changes the “Function Parameters Names” to acomma-separated list of composite variable names, as shown in FIG. 17,which is a functional decomposition view of an exemplary lower leveldecomposition with composite variable names.

Arrays

An array requires a composite consisting of a label, a type, and anarray index or element number to identify it. The following compositelabel will fully identify an array:

Type:variableName:“index or element #”

If the symbol after the second colon is a Greek symbol, it represents anindex; otherwise, it represents an array element. The first indexrepresents a row in MPfd, the second index a column, and the third indexthe matrix depth.

Designating multiple array elements does not designate a loop, only themovement of a certain number of variables.

The composite array name changes the “Function Parameters Names” to acomma-separated list of composite array names, as shown in FIG. 18(lower level decomposition diagram without composite array names anddimensionality) and

FIG. 19 (lower level decomposition diagram with composite array namesand dimensionality).

Matrices

A matrix requires a composite consisting of a label, a type, andmultiple array element designations to identify it. The followingcomposite label will fully identify an array:

Type:variableName a,b, . . . n

Each matrix element represents a matrix dimension. The first elementrepresents the first dimension, the second element the second dimension,etc.

The composite matrix name changes the “Function Parameters Names” to acomma-separated list of composite matrix names, as shown in FIG. 20,which illustrates a lower level decomposition with composite matrixnames with multiple dimensions.

Profiling to Determine Node Count

Determining how well a process bubble will scale requires knowing howmuch exposed work and how much exposed communication time is present.The work time can be obtained by measuring the execution time of theprocess bubble's attached code with data of a known size. The data comesfrom the test plans and procedures that are attached to every processbubble of every project designed using the MPfd model. The communicationtime comes from the a priori determination of actual communication timeand actual latency time. As long as the following criteria is met,computational elements can be added to increase the processingperformance of a process bubble, as shown in Equation 2:

S _(t)/(M _(t) +E _(r))>T  Equation 2 Profile Parallel Target

Where:

-   -   S_(t)=Single-node processing time    -   M_(t)=Multi-node processing time    -   E_(t)=Exposed communication time

The target value T can be set by the present system. Profiling willcontinue until the condition is no longer met. The minimum, maximum, andmedian dataset sizes associated with a design bubble for an particularkernel or algorithm are used to calculate the number of processingelements for any dataset size greater than the minimum and less than themaximum.

Automatic Selection of Data Movement Model

In computer science parlance, there are two ways to transmit data into afunction: pass-by-value and pass-by-reference. Pass-by-value simplymeans that only the contents of some memory location are transmitted tothe function. Sending the contents of a memory location is equivalent tohaving a constant as an input parameter. That is, all changes made tothe value are kept internal to the function with none of those changesaccessible outside of the function. This provides for the“encapsulation” of data, insuring that unwanted side effects do notoccur between functions. Pass-by-reference allows a function to havemultiple output parameters.

The following information is associated with a data element on an MPfd:composite name, input designation, and output designation. Theinput/output designations are a function of the directions of the linesassociated with the composite name. The three possibilities are input,output, or both.

Pass by Value

In an MPfd, pass-by-value is another way of saying that a scalar dataelement (not an array or matrix) is only input into a function, neveroutput from a function. A constant value must also be passed by value asthere is no variable, hence no possibility of referencing a memorylocation. The input-only scalar data element or constant must usepass-by-value, insuring that the data use is encapsulated. Thus,whenever a scalar or constant input is used in an MPfd, it will signifythe use of the pass-by-value method.

Pass by Reference

If the composite name in an MPfd refers to vector data (an array ormatrix), particular data elements must be accessible. In computerprogramming, such access occurs as an offset to some base location.Thus, the base memory location must be transmitted to the function.Also, if the contents of a memory location must change (as is the casefor output scalars), the memory location of the data element needs to beknown. In both cases, a memory location is passed to the function,called referencing, and the contents of the memory location(s) accessed,called dereferencing. This allows the memory locations to be accessedand changed, with the changes visible to other functions simply usingthe same differencing method.

Functional Decomposition Data Transmission Model

Since it is possible for an MPfd to determine the data transmissionmodel (pass-by-value or pass-by-reference) automatically frominformation generated as part of an MPfd, one of the most confusingaspects of modern computer programming can now be performedautomatically, from design.

Automatic Detection of Parallel Algorithm Decomposition

There are two types of parallel processing indicators that can beincluded on MPfd design diagrams: structural and non-structural.Structural parallel indicators are determined by the design without anyextra information. Task parallelism is an example of structuralindication. Other types of parallelism detectable via structuralindication include: transpose detection, parallel I/O detection, scatterdetection, and gather detection.

Non-structural parallel indicators need more information than is usuallygiven in design in order to determine the type of parallelism. Variabledefinitions in computer languages only support the followinginformation: variable name, variable type, and number of dimensions.Parallelizing a code requires two other types of information: topologyand data intent. Topology defines the computational behavior at theedges of a vector or matrix—examples include: Cartesian, toroidal, andspherical.

Data intent is the intended use of the data; examples include:

-   -   (1) particle-like usage—the data represents particles that move        throughout a matrix and may interact,    -   (2) field-like usage—a force that affects to some degree data        across a large section of the matrix simultaneously,    -   (3) search-like intent—data that interacts with a larger set of        data, giving some result, and    -   (4) series expansions/contractions—calculation of the terms of a        mathematical series.

The present MPfd method allows a designer to indicate the algorithmprocessing topology and the data intent, giving the design theinformation required to complete the parallel processing. The topologycan be calculated by the present system 100 based upon the data intent.Alternatively, the topology information can be added to the vector ormatrix information of the input data of a transformation by thedesigner.

Since an algorithm is defined as a functional decomposition element, itcan be decomposed into multiple, simpler algorithms and/or kernels. Aspreviously noted, a functional decomposition element, herein representedby a bubble symbol, can decompose into two types: associated andunassociated. Association has to do with the next-level decomposition ofthe bubble. Depending on the association type, loops defined at a higherdecomposition level behave differently when they are integrated into alower decomposition level.

If the un-decomposed bubble labeled “A” is decomposed into bubbleslabeled “1”, “2”, “3”, and “C” then the un-decomposed bubble is said toreside at Level 1. Bubbles “1”, “2”, “3”, and “C” are said to reside atLevel 2. If the control-flows link together the level 2 bubbles thenthose bubbles are said to be associated. FIG. 21 shows an example ofassociated level-2 bubbles linked via control-flows.

If a looping structure is added to Level 1 (Bubble A) then this isinterpreted to have the following effect on Level 2: 1) the loop willstart with the activation of the first process bubble and end with thelast process-bubble ending, 2) the loop will continue to restart thefirst process bubble until the end-of-loop condition occurs, and 3) uponcompletion of the loop, control will be transferred back to the originallevel-1-defined control bubble or terminator. This is also shown in FIG.21.

If the control-flows do not link together the level 2 bubbles, thosebubbles are said to be unassociated. FIG. 22 shows an example ofunassociated level-2 bubbles.

If a looping structure is added to Level 1 (Bubble A) then the loopingstructure is added to each of the unassociated level 2 bubbles. This isshown in FIG. 23. It is possible for level 2 bubbles to appear to beunassociated because no control-flow binds them but be associatedinstead via data. Data-associated level 2 bubbles are shown in FIG. 23.

Similarly, it is possible to have level-2 bubbles which use the samecontrol structure actually be unassociated as long as neither thecontrol-flows nor the data associates them. This type of unassociatedbubble structure is shown in FIG. 24.

If the decomposition is incorrect, it is sometimes possible to rearrangethe decomposition based upon association. An example of thistransformation to standard unassociated form is shown in FIG. 25.Similarly, it is sometimes possible to rearrange the decomposition basedupon un-association, as shown in FIG. 26, which is an example showingtransformation to standard associated form.

Unassociated Process Bubbles Indicating Task Parallelization

When process bubbles are grouped together but are not associated, thisindicates that those processes can occur at the same time if the tasksare executed on parallel hardware. FIG. 27 shows unassociated processbubbles to task parallel indicating finite state machine. Block 2700indicates a new state made by the system, creating task levelparallelism.

Transpose Notation

By telling the functional decomposition elements that a vector's or anarray's data comes in and is processed then leaves, an opportunity toperform a scatter/gather operation (described below) is defined. Theindices on an input vector or matrix are reversed on the output versionof the same matrix, and the indices are found in the loop, as shown inFIG. 28, which shows a transpose notation in functional decompositionview. Note that the accent mark by the second “A” means that at leastone element of array A has been changed. FIG. 29 shows a transposenotation in finite state machine view.

Scatter/Gather Notation

A scatter/gather moves data to multiple nodes or gathers informationfrom multiple nodes. The indices of the loops match the active indicesof the data, and the order of the data indices does not change. FIG. 30shows an example of scatter/gather notation, functional decompositionview, and FIG. 31 shows the corresponding finite state machine view.Note that if bubble 1 is the first activated process bubble then “A′” isan input. if bubble 1 is the last process bubble then “A” is an outputmatrix.

Parallel Input/Output Indication

Parallel input and output is defined as being from/to a terminatorblock. Since a terminator block represents another system interfacingwith the currently under-design system, obtaining data from thisexternal system is considered input and transmitting data to thisexternal system is considered output. Inputs and outputs to/fromterminator blocks can designate that data for the same vector or matrixis being received or sent via separate, parallel data lines by addingthe “[ ]” designator to the vector or matrix index. For example, thefollowing are parallel input-data streams defined, as shown in FIG. 32:

-   -   A , _([0-10])=2-dimensional array “A” with indexes and Elements        0 through 100 of index and elements 0 through 10 of index are        input.    -   A , _([0-10])=2-dimensional array “A” with indexes and Elements        101 through 200 of index and elements 0 through 10 of index are        input.    -   A , _([0-10])=2-dimensional array “A” with indexes and

Output works analogously. If separate vector or matrix elements areinput/output to/from a process bubble but not to/from a terminator, thena simple element selection is indicated. An example of selectingparticular matrix elements is shown in FIG. 33, wherein process element“1” receives data elements from the “A” matrix rows 0 through 100 andcolumns 0 through 10.

Decomposition Completeness

The present system can automatically determine if a functionaldecomposition is complete, as indicated in FIGS. 34A/34B, whichillustrate examples of incomplete decomposition. One example ofincomplete decomposition is shown in FIG. 34A. If there is at least onealgorithm (bubble 3 in the left-hand diagram, or bubble 2 in theright-hand diagram) which does not decompose into only process andcontrol kernels (the remaining bubbles in FIG. 34A) then thedecomposition is incomplete. Another example of incomplete decompositionis shown in FIG. 34B. If there is a bubble that does not have at leastone input and one output then the decomposition is consideredincomplete.

Cross-Communication Notation

Data-type issues typically revolve around the concept of data primitivetypes: integer, real, double, complex, float, string, binary, etc.Groups of data entities are discussed via their dimensionality, asstructures, or as structures containing data entities with variousdimensionalities. Data primitives, data group structure, anddimensionality all represent a static view of the data. In an MPfd, thisinformation is placed in a table that appears on data flows and datastores. Table 2, below, is an example of a table that provides thisinformation.

TABLE 2 VARIABLE DESCRIPTION

The variable name gives a name to an object for the DecompositionAnalysis graph. The description is a text description of the variablejust named. The variable type is the data-primitive type. The number ofdimensions describes the dimensionality of the variable: 0-dimensionmeans a standard variable, 1-dimension a vector, and >1-dimension amatrix. The dimension size is required for >1-dimensional objects toindicate the number of variable objects that occur in each dimension.The topology explains how the >0-dimensional object treats its space.

The following are potential topologies: unconnected edges: Cartesian;connected edges: 1-dimension (ring), 2-dimensions (cylindrical, toroid,spherical), and 3-dimensions (hyper-cube). The topology informationfollows the variable.

In computer systems, data is rarely static; it is moved, transformed,combined, taken apart: data in computer systems is typically dynamic.The dynamic use of the data is an attribute that is not typically shownin standard representations of data for computer use. With the advent ofparallel processing, the dynamic aspects of the data are needed for theselection of the proper parallel processing technique. Examples of thegraphical depiction of possible dynamic data usage are shown below.

Monotonic Data Use

Concept: Linked calculations whose workload grows or shrinks after eachcalculation.

Use: Whenever the workload changes monotonically for each componentcalculation in a series of calculations.

Example Use: Arbitrary precision series expansion calculation oftranscendental numbers.

Parallel Issue: Load balancing. Since the workload changesmonotonically, the last calculation has a workload that is verydifferent from the first calculation. Since the computation time of agroup of nodes working on a single problem is equal to computation timeof the slowest node and, further, since the effect of naively placingthe work in the same order as the calculation order is to concentratethe work onto a single node, this produces a non-optimal parallelsolution.

Topology Effects: None

Action: Create a mesh to provide load balancing.

Action Example: The purpose of this mesh type is to provide loadbalancing when there is a monotonic change to the work load as afunction of which data item is used. The profiler shall calculate thetime it takes to process each element. Below shows a naive attempt toparallelize such a problem. Sixteen work elements are distributed overfour computational nodes. The work increases or decreases monotonicallywith the work-element number. Below is a 1-dimensional example of anaive work distribution of a monotonic workload-changing problem.

TABLE 3 NAIVE WORK DISTRIBUTION OF A MONOTONIC WORKLOAD CHANGING PROBLEMNode # Node₁ Node₂ Node₃ Node₄ Work 1, 2, 3, 4 5, 6, 7, 8, 9, 10, 11, 1213, 14, 15, 16 Elements

The mesh shown in Table 3 decomposes the work elements by dividing thenumber of work elements by the number of nodes and assigning each workelement to each node in a linear fashion.

Instead of linearly assigning work elements to nodes, the work elementscan be alternated to balance the work. For monotonic workload changes,this means the first and last elements are paired, the second andsecond-to-last elements are paired, etc., as shown in Table 4:

TABLE 4 NON-NAÏVE WORK 1-DIMENSIONAL DISTRIBUTION OF A MONOTONICWORKLOAD CHANGING PROBLEM Node # Node₁ Node₂ Node₃ Node₄ Work 1, 16, 2,15 3, 14, 4, 13, 5, 12, 6, 11 7, 10, 8, 9 Elements

FIG. 35 shows a 1-dimensional monotonic workload symbol in functionaldecomposition view. If a one-dimensional workload is monotonic, thenthat information is given to MPfd with the symbols shown in FIG. 35. Thesymbol α* * means that the work (represented as the work within a loop)changes monotonically and that this workload effect applies to vector“A”. That is, α* * means that index alpha is intended to access the datamonotonically. Thus the alpha is the loop index and the *mu* is theintended use of the data accessed using the alpha index.

Note that, for brevity, the loop is defined by(index:calculation:condition) where the index is the loop index plus anyclarifying symbol by the loop index, the calculation is the nextindex-value calculation, and the condition is the loop-ending condition.FIG. 36 shows a 1-dimensional monotonic workload symbol in finite statemachine view. Table 5, below, shows a two-dimensional version of themonotonic workload-changing mesh.

TABLE 5 NON-NAIVE WORK 2-DIMENSIONAL DISTRIBUTION OF A MONOTONICWORKLOAD CHANGING PROBLEM X1 X2 Y1 1, 64, 2, 63 3, 62, 4, 61 5, 60, 6,59 7, 58, 8, 57 9, 56, 10, 55 11, 54, 12, 53 13, 52, 14, 51 15, 50, 16,49 Y2 17, 48, 18, 47 19, 46, 20, 45 21, 44, 22, 43 23, 42, 24, 41 25,40, 26, 39 27, 38, 28, 37 29, 36, 30, 35 31, 34, 32, 33

If a two-dimensional workload is monotonic then that information isgiven to MPfd with the following symbols. The symbol means that the work(represented as the work within a loop) changes monotonically and thatthis workload effect applies to vector “A”.

FIG. 37 shows a 2-dimensional monotonic workload symbol in functionaldecomposition view, and FIG. 38 shows a 2-dimensional monotonic workloadsymbol in finite state machine view.

Table 6, below, shows a three-dimensional version of the monotonicworkload-changing mesh.

TABLE 6 NON-NAIVE WORK 2-DIMENSIONAL DISTRIBUTION OF A MONOTONICWORKLOAD CHANGING PROBLEM X1 X2 Z1 Y1 1, 256, 2, 255 3, 254, 4, 253 5,252, 6, 7, 250, 8, 251 249 9, 248, 10, 247 11, 246, 12, 245 13, 244, 14,15, 242, 16, 243 241 Y2 17, 240, 18, 19, 238, 20, 237 21, 236, 22, 23,234, 24, 239 235 233 25, 232, 26, 27, 230, 28, 229 29, 228, 30, 31, 226,32, 231 227 225 Z2 Y1 33, 224, 34, 223 35, 222, 36, 37, 220, 38, 39,218, 40, 221 219 217 41, 216, 42, 215 43, 214, 44, 45, 212, 46, 47, 210,48, 213 211 209 Y2 49, 208, 50, 207 51, 206, 52, 53, 204, 54, 55, 202,56, 205 203 201 57, 200, 58, 199 59, 198, 60, 61, 196, 62, 63, 194, 64,197 195 193 Z3 Y1 65, 192, 66, 191 67, 190, 68, 69, 188, 70, 71, 186,72, 189 187 185 73, 184, 74, 183 75, 182, 76, 77, 180, 78, 79, 178, 80,181 179 177 Y2 81, 176, 82, 175 83, 174, 84, 85, 172, 86, 87, 170, 88,173 171 169 89, 168, 90, 167 91, 166, 92, 93, 164, 94, 95, 162, 96, 165163 161 Z4 Y1 97, 160, 98, 159 99, 158, 101, 156, 103, 154, 100, 157102, 155 104, 153 105, 152, 106, 151 107, 150, 109, 148, 111, 146, 108,149 110, 147 112, 145 Y2 113, 144, 114, 143 115, 142, 117, 140, 119,138, 116, 141 118, 139 120, 137 121 136, 122, 135 123, 134, 125, 132,127, 130, 124, 133 126, 131 128, 129

FIG. 39 3-dimensional monotonic workload symbol in functionaldecomposition view, and FIG. 40 shows a 3-dimensional monotonic workloadsymbol in finite state machine view. If a three-dimensional workload ismonotonic then that information is given to MPfd with the symbol shownin FIG. 39. There are three symbols attached to the three loops (, and)These symbols mean that the work (represented as the work within a loop)changes monotonically and that this workload effect applies to vector“A”.

Particle Use Model

Concept: Particles are used to define discrete objects that move about avector or array.

Use: Modeling physical phenomenon, atoms, ray-traces, fluids, etc.

Example Use Computational fluid dynamics, changing image analysis.

Parallel Issue: Information sharing.

Action: Determine what to cross communicate.

A one-dimensional particle exchange with Cartesian topology generatesthe following version (shown in Tables 7 and 8) of a left-rightexchange.

TABLE 7 INITIAL 1-DIMENSIONAL CONDITION BEFORE LEFT-RIGHT EXCHANGE(Cartesian Topology) Node # Node₁ Node₂ Node₃ Node₄ Work 1, 2, 3, 4 5,6, 7, 8, 9, 10, 11, 12 13, 14, 15, 16 Elements

TABLE 8 1-DIMENSIONAL CONDITION AFTER ONE LEFT-RIGHT EXCHANGE Node #Node₁ Node₂ Node₃ Node₄ Work 1, 2, 3, 5 4, 6, 7, 9, 8, 10, 11, 13 12,14, 15, 16 Elements

A one-dimensional particle exchange with a Ring topology generates thefollowing version (shown in Table 9 and 10) of a left-right exchange.

TABLE 9 INITIAL 1-DIMENSIONAL CONDITION BEFORE LEFT-RIGHT EXCHANGE (RingTopology) Node # Node₁ Node₂ Node₃ Node₄ Work 1, 2, 3, 4 5, 6, 7, 8 9,10, 11, 12 13, 14, 15, 16 Elements

TABLE 10 1-DIMENSIONAL CONDITION AFTER ONE LEFT-RIGHT EXCHANGE (RingTopology) Node # Node₁ Node₂ Node₃ Node₄ Work 16, 2, 3, 5 4, 6, 7, 9 8,10, 11, 13 12, 14, 15, 1 Elements

Note: Node₄ edge information wraps around to node₁ and node₁ wrapsaround to node₄ in the Ring topology version of the left-right exchange.

FIG. 41 (functional decomposition view) depicts a left-right exchangesymbol (*π*) indicating no stride, also shown in the finite statemachine view of FIG. 42. If a one-dimensional vector is used to depictparticles then the * π* symbol shown in FIG. 41 is used.

If the processing of the vector skips one or more elements (calledstriding) then less data needs to be exchanged. The index calculation onthe loop indicator can be modified to *π+n* to indicate striding. FIG.43 depicts a left-right exchange—with stride in a functionaldecomposition view, and FIG. 44 depicts a left-right exchange in finitestate machine view.

A two-dimensional particle exchange with Cartesian topology, generatesthe following version (shown in Table 11 below) of a next-neighborexchange (edge-number exchange only).

TABLE 11 INITIAL 2-DIMENSIONAL CONDITION BEFORE NEXT-NEIGHBOR EXCHANGE(CARTESIAN TOPOLOGY) X1 X2 Y1 1, 2, 3, 4 5, 6, 7, 8 9, 10, 11, 12 13,14, 15, 16 17, 18, 19, 20 21, 22, 23, 24 25, 26, 27, 28 29, 30, 31, 32Y2 33, 34, 35, 36 37, 38, 39, 40 41, 42, 43, 44 45, 46, 47, 48 49, 50,51, 52 53, 54, 55, 56 57, 58, 59, 60 61, 62, 63, 64

TABLE 12 2-DIMENSIONAL CONDITION AFTER ONE NEXT-NEIGHBOR EXCHANGE(CARTESIAN TOPOLOGY) X1 X2 Y1 1, 2, 3, 4 5, 6, 7, 9 8, 10, 11, 12 13,14, 15, 16 33, 34, 35, 36 37, 38, 39, (24, 41, 40), 42, 45, 46, 47, 48(25, 41, 40) 43, 44 Y2 17, 18, 19, 20 21, 22, 23, (40, 24, 25), 26, 29,30, 31, 32 (24, 25, 41) 27, 28 49, 50, 51, 52 53, 54, 55, 57 56, 58, 59,60 61, 62, 63, 64

Note: Parenthesis indicates that the information here is overlaid suchthat the underlying code treats it as if it were adjacent memory.

A two-dimensional particle exchange with Cylindrical topology generatesthe following version (shown in Tables 13 and 14) of a next-neighborexchange (edge-number exchange only).

TABLE 13 INITIAL 2-DIMENSIONAL CONDITION BEFORE NEXT-NEIGHBOR EXCHANGE(CYLINDRICAL TOPOLOGY) X1 X2 Y1 1, 2, 3, 4 5, 6, 7, 8 9, 10, 11, 12 13,14, 15, 16 17, 18, 19, 20 21, 22, 23, 24 25, 26, 27, 28 29, 30, 31, 32Y2 33, 34, 35, 36 37, 38, 39, 40 41, 42, 43, 44 45, 46, 47, 48 49, 50,51, 52 53, 54, 55, 56 57, 58, 59, 60 61, 62, 63, 64

TABLE 14 2-DIMENSIONAL CONDITION AFTER ONE NEXT-NEIGHBOR EXCHANGE(CYLINDRICAL TOPOLOGY) X1 X2 Y1 49, 50, 51, 52 53, 54, 55, (8, 57, 56),58, 59, 61, 62, 63, (9, 56, 57) 60 64 33, 34, 35, 36 37, 38, 39, (24,41, 40), 42, 45, 46, 47, (25, 41, 40) 43, 44 48 Y2 17, 18, 19, 20 21,22, 23, (40, 24, 25), 26, 29, 30, 31, (24, 25, 41) 27, 28 32 1, 2, 3, 45, 6, 7, (8, 57, 9) (56, 9, 8), 10, 11, 13, 14, 15, 12 16

A two-dimensional particle exchange with Toroid topology generates theversion of a next-neighbor exchange (edge-number exchange only) shown inTables 15 and 16 below.

TABLE 15 INITIAL 2-DIMENSIONAL CONDITION BEFORE NEXT-NEIGHBOR EXCHANGE(TOROID TOPOLOGY) X1 X2 Y1 1, 2, 3, 4 5, 6, 7, 8 9, 10, 11, 12 13, 14,15, 16 17, 18, 19, 20 21, 22, 23, 24 25, 26, 27, 28 29, 30, 31, 32 Y233, 34, 35, 36 37, 38, 39, 40 41, 42, 43, 44 45, 46, 47, 48 49, 50, 51,52 53, 54, 55, 56 57, 58, 59, 60 61, 62, 63, 64

TABLE 16 2-DIMENSIONAL CONDITION AFTER ONE NEXT-NEIGHBOR EXCHANGE(Toroid Topology) X1 X2 Y1 (49, 16), 50, 51, 52 53, 54, 55, (8, 57, 56),58, 61, 62, 63, (9, 56, 57) 59, 60 (64, 1) (33, 32), 34, 35, 36 37, 38,39, (24, 41, 40), 42, 45, 46, 47, (25, 41, 40) 43, 44 (48, 17) Y2 (17,48), 18, 19, 20 21, 22, 23, (40, 24, 25), 26, 29, 30, 31, (24, 25, 41)27, 28 (32, 33) (1, 64), 2, 3, 4 5, 6, 7, (56, 9, 8), 10, 13, 14, (8,57, 9) 11, 12 15, (16, 49)

FIG. 45 shows a next-neighbor exchange—no stride, in functionaldecomposition view; FIG. 46 shows a next-neighbor exchange—no stride, infinite state machine view; FIG. 47 shows a next-neighbor exchangesymbol—with stride, in functional decomposition view; and FIG. 48 showsa next-neighbor exchange—with stride, in finite state machine view. If atwo-dimensional matrix is used to depict particles then the symbol shownin FIGS. 45/47 is used. A new state is automatically added when thesystem recognizes that a next neighbor exchange is to be used. The dataexchange is modified with the “stride” information indicating how muchdata to skip with each exchange.

A three-dimensional particle exchange with Cartesian topology generatesthe version of a next-neighbor exchange (edge-number exchange only)shown in Tables 17 and 18, below.

TABLE 17 INITIAL 3-DIMENSIONAL CONDITIONS BEFORE NEXT- NEIGHBOR EXCHANGE(CYLINDRICAL TOPOLOGY) X1 X2 Z1 Y1 1, 2, 3, 4 5, 6, 7, 8 9, 10, 11, 1213, 14, 15, 16 17, 18, 19, 20 21, 22, 23, 24 25, 26, 27, 29, 30, 31 2832 Y2 33, 34, 35, 36 37, 38, 39, 40 41, 42, 43, 45, 46, 47, 44 48 49,50, 51, 52 53, 54, 55, 56 57, 58, 59, 61, 62, 63, 60 64 Z2 Y1 65, 66,67, 68 69, 70, 71, 73, 74, 75, 76 77, 78, 79, 72 80 81, 82, 83, 84 85,86, 87, 89, 90, 91, 92 93, 94, 95, 88 96 Y2 97, 98, 99, 100 101, 102,105, 106, 107, 109, 110, 103, 104 108 111, 112 113, 114, 115, 116 117,118, 121, 122, 123, 125, 126, 119, 120 124 127, 128 Z3 Y1 129, 130, 131,132 133, 134, 137, 138, 139, 141, 142, 135, 136 140 143, 144 145, 146,147, 148 149, 150, 153, 154, 155, 157, 158, 151, 152 156 159, 160 Y2161, 162, 163, 164 165, 166, 169, 170, 171, 173, 174, 167, 168 172 175,176 177, 178, 179, 180 181, 182, 185, 186, 187, 189, 190, 183, 184 188191, 192 Z4 Y1 193, 194, 195, 196 197, 198, 201, 202, 203, 205, 206,199, 200 204 207, 208 209, 210, 211, 212 213, 214, 217, 218, 219, 221,222, 215, 216 220 223, 224 Y2 225, 226, 227, 228 229, 230, 233, 234,235, 237, 238, 231, 232 236 239, 240 241, 242, 243, 244 245, 246, 249,250, 251, 253, 254, 247, 248 252 255, 256

TABLE 18 DIMENSIONAL CONDITION AFTER ONE NEXT-NEIGHBOR EXCHANGE(Cartesian Topology) X1 X2 Z1 Y1 65, 69, (8, 73), (13, 77), 66, 70, (10,74), (14, 78), 67, 71, (11, 75), (15, 79), 68 (9, 72) (12, 76) (16, 80)81, 85, (24, 40, 41, 89), (45, 93), 82, 86, (42, 90), (46, 94), 83, 87,(43, 91), 47, 95), 84 (25, 40, 41, 88) (44, 92) (48, 96) Y2 (17, 97),(21, 101), (24, 25, 40, 105), (29, 109), (18, 98), (22, 102), (26, 106),(30, 110), (19, 99), (23, 103), (27, 107), (31, 111), (20, 100) (24, 25,41, 104) (28, 108) (32, 112) 113, 117, (56, 121), 125, 114, 118, 122,126, 115, 119, 123, 127, 116 (57, 120) 124 128 Z2 Y1 (65, 1, 129), (69,5, 133), (72, 9, 137), (77, 13, 141), (66, 2, 130), (70, 6, 134), (10,74, 138), (78, 14, 142), (67, 3, 131), (71, 7, 135), (75, 11, 139), (79,15, 143), (68, 4, 132) (73, 8, 136) (76, 12, 140) (80, 16, 144) (97, 17,129)(98, (101, 21, 133), (102, (104, 105, 88, 25, 153), (109, 29, 157),18, 130), (99, 19, 22, 134), (103, 23, (106, 26, 154), (110, 30, 158),131), 135), (107, 27, 155), (111, 31, 159), (100, 20, 132) (104, 89,105, 24, 136) (108, 28, 156) (112, 32, 160) Y2 (81, 33, 161), (85, 37,165), (89, 104, 88, 41, 169), (93, 45, 173), (82, 34, 162), (86, 38,166), (90, 42, 170), (94, 46, 174), (83, 35, 163), (87, 39, 167), (91,43, 171), (95, 47, 175), (84, 36, 164) (88, 40, 168, 89, 105) (92, 44,172) (96, 48, 176) (113, 49, 177), (117, 53, 181), (120, 57, 185), (125,61, 189), (114, 50, 178), (118, 54, 182), (122, 58, 186), (126, 62,190), (115, 51, 179), (119, 55, 183), (123, 59, 187), (127, 191), (116,52, 180) (121, 56, 184) (124, 60, 188) (128, 64, 192) Z3 Y1 (129, 65,193), (133, 69, 197), (136, 73, 201), (141, 77, 205), (130, 66, 194),(134, 70, 198), (138, 74, 202), (142, 78, 206), (131, 67, 195), (135,71, 199), (139, 75, 203), (143, 79, 207), (132, 68, 196) (137, 72, 200)(140, 76, 204) (144, 80, 208) (161, 81, 209), (165, 85, 213), (169, 152,168, 89, 217), (173, 93, 221), (162, 82, 210), (166, 86, 214), (170, 90,218), (174, 94, 222), (163, 83, 211), (167, 87, 215), (171, 91, 219),(175, 95, 223), (164, 84, 212) (168, 153, 169, 88, (172, 92, 220) (176,96, 223) 216) Y2 (145, 97, 225), (149, 101, 229), (153, 152, 168, 105,(157, 109, 237), (146, 98, 226), (150, 102, 230), 233), (158, 110, 238),(147, 99, 227), (151, 103, 231), (154, 106, 234), (159, 111, 239), (148,100, 228) (152, 104, 232, 153, (155, 107, 235), (160, 112, 240) 169)(156, 108, 236) (177, 113, 241), (181, 117, 245), (184, 121, 249), (189,125, 252), (178, 114, 242), (182, 118, 246), (186, 122, 249), (190, 126,253), (179, 115, 243), (183, 119, 247), (187, 123, 250), (191, 127,254), (180, 116, 244) (185, 120, 248) (188, 124, 251) (192, 128, 255) Z4Y1 (193, 129), (197, 133), (200, 137), (205, 141), (194, 130), (198,134), (202, 138), (206, 142), (195, 131), (199, 135), (203, 139), (207,143), (196, 132) (201, 136) (204, 140) (208, 144) (225, 145), (229,149), (233, 232, 216, 153), (237, 157), (226, 146), (230, 150), (234,154), (238, 158), (227, 147), (231, 151), (235, 155), (239, 159), (228,148) (232, 217, (236, 156) (240, 160) 233, 152) Y2 (209161), (213, 165),(217, 232, 216, 169), (221, 173), (210, 162), (214, 166), (218, 170),(222, 174), (211, 163), (215, 167), (219, 171), (223, 175), (212, 164)(216, 168, (220, 172) (224, 176) 217, 233) (241, 177), (245, 181), (248,185), (253, 189), (242, 178), (246, 182), (250, 186), (254, 190), (243,179), (247, 183), (251, 187), (255, 191), (244, 180) (249, 184) (252,188) (256, 192)

FIG. 49 shows a 3-dimensional next-neighbor exchange symbol [* *]indicating no stride, in functional decomposition view; FIG. 50 shows a3-dimensional next-neighbor exchange—no stride, in finite state machineview; FIG. 51 shows a 3-dimensional next-neighbor exchange—with stride,in functional decomposition view; and FIG. 52 shows a 3-dimensionalnext-neighbor exchange—with stride, in finite state machine view. If athree-dimensional matrix is used to depict particles, then the symbolshown in FIG. 49 is used.

FIG. 53 shows a 2-dimensional matrix with 2-dimensional stencil for 2-dnext-n-neighbor exchange symbol—no stride, in functional decompositionview; FIG. 54 shows a 2-dimensional matrix with 2-dimensional stencilfor 2-d next-n-neighbor exchange—no stride, in finite state machineview; and FIG. 55 shows a 2-dimensional matrix with 2-dimensionalstencil for 2-d next-n-neighbor exchange symbol—with stride, infunctional decomposition view. The next-neighbor exchange can beextended to a next-n-neighbor exchange. Frequently, the depth of theexchange is a function of some size of the stencil that is applied toit. The exchange will consist of using the number of elements along thedimension of the exchange found in the stencil. If the number ofelements is greater than the discretization size then the data must beshared across multiple nodes. Since the stencil is itself a vector ormatrix, the symbol for a two-dimensional matrix with a two-dimensionalstencil (shown in FIG. 53) can be used to generate a next-n-neighborexchange.

FIG. 56 shows a 2-dimenssional matrix with 2-dimensional stencil for 2-dnext-n-neighbor exchange—with stride, in finite state machine view.Since B cannot change (depicted by the lack of an accent mark) and hasthe same number of dimensions as A′, it is assumed to be a stencil. Notethat the stencil must be smaller than the processed vector or matrix inevery dimension; otherwise, it is considered a non-stenciled matrixoperation, and the next-n-matrix does not apply.

Field Use Model

Concept: A field affects everything at once so if the field isdistributed over multiple nodes then everything must communicate witheverything.

Use: Modeling physical phenomenon.

Example Use: Gravity modeling.

Parallel Issue: Information exchange.

Action: Determine what to cross communicate.

Action Example: Perform an all-to-all exchange of data.

FIG. 57 shows a 1-dimensional all-to-all exchange symbol—no stride, infunctional decomposition view; FIG. 58 shows a 1-dimensional all-to-allexchange—no stride, in finite state machine view; FIG. 59 shows a1-dimensional all-to-all exchange symbol—with stride, in functionaldecomposition view; FIG. 60 shows a 1-dimensional all-to-allexchange—with stride, in finite state machine view; and If aone-dimensional vector is used to depict a field then the symbol shownin FIG. 57 is used.

FIG. 61 shows a 2-dimensional all-to-all exchange symbol —no stride, infunctional decomposition view; FIG. 62 shows a 2-dimensional all-to-allexchange—no stride, in finite state machine view; FIG. 63 shows a2-dimensional all-to-all exchange symbol—with stride, in functionaldecomposition view figure; and FIG. 64 shows a 2-dimensionalall-to-all—with stride, IN finite state machine view. If atwo-dimensional matrix is used to depict fields then the symbol shown inFIG. 61 is used.

FIG. 65 shows a 3-dimensional all-to-all exchange symbol—no stride, infunctional decomposition view; FIG. 66 shows a 3-dimensional all-to-allexchange—no stride, in finite state machine view; FIG. 67 shows a3-dimensional all-to-all exchange symbol—with stride, in functionaldecomposition view; and FIG. 68 shows a 3-dimensional all-to-allexchange—with stride, in finite state machine view. If athree-dimensional matrix is used to depict fields then the symbol shownin FIG. 65 is used.

Certain changes may be made in the above methods and systems withoutdeparting from the scope of that which is described herein. It is to benoted that all matter contained in the above description or shown in theaccompanying drawings is to be interpreted as illustrative and not in alimiting sense. The elements and steps shown in the present drawings maybe modified in accordance with the methods described herein, and thesteps shown therein may be sequenced in other configurations withoutdeparting from the spirit of the system thus described. The followingclaims are intended to cover all generic and specific features describedherein, as well as all statements of the scope of the present method,system and structure, which, as a matter of language, might be said tofall there between.

What is claimed is:
 1. A method for performing functional decompositionto generate a computer-executable finite state machine (FSM), the methodcomprising: receiving, via manual interaction by a user, a graphicaldiagram of a software design, the software design comprising functionsthat are repetitively decomposed into hierarchical sub-functionsincluding data transformations and control transformations until each ofthe data transformations consists of a respective linear code block, thegraphical diagram comprising hierarchical decomposition levels eachrespectively representing the sub-functions, each decomposition levelcomprising: a control bubble indicating the control transformationswithin the software design, a process bubble indicating the datatransformations within the software design, and a control flow indicatorbetween the control and process bubbles having atransformation-selection condition associated therewith; translating thegraphical diagram into a graphical FSM diagram wherein: at each of thedecomposition levels, the control bubble is a first state indicator inthe FSM diagram, the process bubble is a subsequent state indicator inthe FSM diagram, and the control flow indicator is a state transitionindicator in the graphical FSM diagram; and receiving a manipulation bythe user of at least one of the graphical diagram and graphical FSMdiagram.
 2. The method of claim 1, further comprising generating an FSMas an executable program constructed from the linear code blocks,wherein state transition-selection conditions associated with thecontrol flow indicatorsare state transitions within the FSM.
 3. Themethod of claim 1, the graphical diagram further comprising a loopingstructure associated with at least one of the process bubbles.
 4. Themethod of claim 1, the process bubble comprising at least one ofassociated process bubbles and an unassociated process bubble; whereinthe process bubbles are associated if the control bubble links togethera pair of the process bubbles, otherwise the process bubbles areunassociated.
 5. The method of claim 4, the hierarchical decompositionlevels comprising a higher-level decomposition level and a lower leveldecomposition level; wherein if the higher-level decomposition levelincludes a looping structure, each unassociated process bubble withinthe lower-level decomposition level is automatically given the samelooping structure.
 6. The method of claim 1, the graphical diagramfurther comprising an execution-stream identifier representing acorresponding program thread of each respective function.
 7. The methodof claim 1, wherein the data transformations accept and generate datausing at least one input and one output variable, and the controltransformations include only transitions with no associated code blocks.8. The method of claim 1, wherein the control transformations include anon-event control item comprising a condition that changes the sequenceof execution of program logic within the software design.
 9. The methodof claim 8, wherein each said non-event control item is selected fromthe list consisting of if-then-else, go to, function calls, and functionreturns.
 10. The method of claim 1, wherein each of the control flowindicators comprises a function-order arrow and an associated conditionconstituting a regular conditional expression.
 11. The method of claim1, wherein data is passed only to a data transformation via a datastore.
 12. The method of claim 1, wherein the lowest level of thehierarchical decomposition levels indicates that none of the datatransformations represented in the lowest level can be decomposed into aset of data transformations grouped together with a controltransformation.
 13. The method of claim 1, wherein: only one saidcontrol bubble exists at each decomposition level; each said processbubble is invoked by a control bubble; each said process bubbletransmits and receives data from a data store via a data flow; each saidcontrol bubble receives and uses data as part of determining which saidprocess bubble is to be called; and each said process bubble returnscontrol to its calling control bubble.
 14. The method of claim 1,further comprising determining processing topology by associating, withinput data of at least one of the control and process transformations,information indicating the preferred topology for executing the finitestate machine.
 15. The method of claim 1, further comprising determiningprocessing parallelism by associating, with input data of at least oneof the control and process transformations, data intent informationindicating an intended use of the data.